Paper Review

Summary

  • A brief overview of Gaussian Processes and their pros/cons.
  • Ten step framework for setting up a “robust and well-specified” (GP) model.
  • Case study of interpolating glacial elevation data collected by satellites over Greenland.
  • More engineering advice than mathematical treatise.
    • ML oriented rather than statistical inference focussed.

Some GP equations

For data \(\{(\boldsymbol{x}_i, y_i); i = 1, \ldots, N\}\) with \(\boldsymbol{x}_i \in \mathbb{R}^d\) and \(y_i \in \mathbb{R}\),

\[y_i = f(\boldsymbol{x}_i) + \varepsilon_i, \quad \varepsilon_i \sim \mathcal{N}(0, \sigma^2_n)\]

We propose a underlying function,

\[f(\cdot) \sim\mathcal{GP} \left( \mu(\boldsymbol{x};\boldsymbol\theta_\mu), k(\boldsymbol{x}, \boldsymbol{x'}; \boldsymbol{\theta}_k) \right)\]

where \(\mu(\cdot)\) is the mean function and \(k(\cdot)\) is the covariance kernel function, with hyperparameters \(\boldsymbol\theta_\mu\) and \(\boldsymbol\theta_k\), respectively.

Strengths & Limitations of GPs

Strengths

  • Well-calibrated uncertainty estimates
  • Handles data that is not IID
  • More interpretable than ML
  • Data-efficient when retraining (?)
    • More to do with Bayesian workflow than GPs.
  • Unlikely to fail (?)
    • Reliable as a subpart of a broader ML system.

Limitations

  • Training on large datasets (\(N \gtrsim 10^4\)) is prohibitive
    • Covariance Matrix inversion is \(\mathcal{O}(N^3)\)
  • Training on high-dimensional data (\(D \gtrsim 100\)) is challenging
    • Covariance function scales at \(\mathcal{O}(DN^2)\)
  • Risk of overfitting (?)
    • Complex kernel functions with many parameters.
  • Non-Gaussian distributions (?)
  • Model misspecification (?)

Framework

Step 1: Problem Definition

  1. [ThunderKAT] Interpolate sparsely sampled light curves in order to calculate power spectral density (PSD) and summary statistics for later use in source classification and identification.
  1. [Simulation] Understand the effect of downsampling on the accurate recovery of PSDs from light curves.

Step 2: Initial Data Exploration

  • ThunderKAT data has \(N = 6,394\) light curves, with \(n_i \in [1, 166]\) observations, ranging from \([1, 1296]\) days.

    • authors consider this small and warn against overfitting if a complex kernel is used.
  • Light curves are (usually) univariate and 1-dimensional so should not cause a computational bottleneck.

  • Simulated light curves have \(n_i = 500\) observations.

  • Estimating the uncertainties around PSDs and \(\theta\)s are crucial which favours using GPs.

Step 3: Domain Expertise

For light curves from MeerKAT:

  • sampled at approximately weekly cadences.
  • some observations probe down to \(1 \sigma\) sensitivity (\(\approx 1 \mu \textrm{Jy}\)).
  • down to the 8 second integration time of MeerKAT.

For the simulated datasets, I can set arbitrary values for parameter ground truths.

These need to enter to setting priors.

Step 4: Training, Validation, Test Sets

  • We don’t have the luxury of splitting already very sparse MeerKAT datasets.
    • Care must be taken to split time series anyway.
  • We can generate a ground truth baseline dataset in the simulation study.

Step 5: Scaling Structures

  • For time series data, the authors suggest mapping the GP to a Stochastic Differential Equation (SDE).
    • Not really applicable to sparsely sampled series like MeerKAT
  • This seems to be more a computational consideration rather than an inferential concern.
    • We’re trying to put GPs through its paces, not just get a nice prediction or interpolation result.
  • Perhaps relevant for later multi-wavelength analyses.

Step 6: Data Transformations

The authors suggest standardising the data, i.e., z-scores, to improve inference.

Step 7: Kernel Design

  • Need to consider: Smoothness, lengthscales, periodicity, outliers and tails, asymmetry, and stationarity.

  • Authors suggest “kernel composition”.

  • For simulation study: Add SE and periodic kernels since the physics suggests these components will be present additively.

  • For ThunderKAT: lengthscale priors should reflect the size of data gaps and the total length of light curve.

  • Transients are by definition not stationary!

Step 8: Model Iteration

  • Authors suggest setting up a simple non-GP baseline model for comparison.

  • Simulation study: Lomb-Scargle periodogram of a fully sampled simulated light curve.

    • Pilot simulation has shown the correct recovery of known hyperparameter values.
  • ThunderKAT analysis is still underway.

Case Study: Greenland Glacial Elevation Changes

Comments on Paper

  • Lists limitations but not much guidance on mitigation.
  • Claim about Non-Gaussian distributions is not correct; fixated on additive Gaussian noise/likelihoods.
  • Kernel design is crucial; advice hard to generalise.
  • Case Study doesn’t really demonstrate their framework.
  • Needs clarity on what constitutes “robust” or “well-specified”.
  • Really good bibliography of GP literature.
  • GPyTorch code is provided.

Takeaways

  • Consider standardising data before fitting GP.
  • Investigate what relevant “scaling structures” might be useful for when dimensionality increases for LSST data.
  • Consider what would be a good baseline non-GP comparator model for ThunderKAT study.